Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 16 de 16
Filtrar
1.
J Med Internet Res ; 26: e50890, 2024 Jan 30.
Artigo em Inglês | MEDLINE | ID: mdl-38289657

RESUMO

Machine learning (ML) has seen impressive growth in health science research due to its capacity for handling complex data to perform a range of tasks, including unsupervised learning, supervised learning, and reinforcement learning. To aid health science researchers in understanding the strengths and limitations of ML and to facilitate its integration into their studies, we present here a guideline for integrating ML into an analysis through a structured framework, covering steps from framing a research question to study design and analysis techniques for specialized data types.


Assuntos
Aprendizado de Máquina , Reforço Psicológico , Humanos , Projetos de Pesquisa , Pesquisadores
2.
Brief Bioinform ; 24(5)2023 09 20.
Artigo em Inglês | MEDLINE | ID: mdl-37738402

RESUMO

Understanding the function of the human microbiome is important but the development of statistical methods specifically for the microbial gene expression (i.e. metatranscriptomics) is in its infancy. Many currently employed differential expression analysis methods have been designed for different data types and have not been evaluated in metatranscriptomics settings. To address this gap, we undertook a comprehensive evaluation and benchmarking of 10 differential analysis methods for metatranscriptomics data. We used a combination of real and simulated data to evaluate performance (i.e. type I error, false discovery rate and sensitivity) of the following methods: log-normal (LN), logistic-beta (LB), MAST, DESeq2, metagenomeSeq, ANCOM-BC, LEfSe, ALDEx2, Kruskal-Wallis and two-part Kruskal-Wallis. The simulation was informed by supragingival biofilm microbiome data from 300 preschool-age children enrolled in a study of childhood dental disease (early childhood caries, ECC), whereas validations were sought in two additional datasets from the ECC study and an inflammatory bowel disease study. The LB test showed the highest sensitivity in both small and large samples and reasonably controlled type I error. Contrarily, MAST was hampered by inflated type I error. Upon application of the LN and LB tests in the ECC study, we found that genes C8PHV7 and C8PEV7, harbored by the lactate-producing Campylobacter gracilis, had the strongest association with childhood dental disease. This comprehensive model evaluation offers practical guidance for selection of appropriate methods for rigorous analyses of differential expression in metatranscriptomics. Selection of an optimal method increases the possibility of detecting true signals while minimizing the chance of claiming false ones.


Assuntos
Benchmarking , Doenças Estomatognáticas , Criança , Humanos , Pré-Escolar , Biofilmes , Simulação por Computador , Ácido Láctico
3.
Stat Methods Med Res ; 32(7): 1300-1317, 2023 07.
Artigo em Inglês | MEDLINE | ID: mdl-37167422

RESUMO

The zero-inflated negative binomial distribution has been widely used for count data analyses in various biomedical settings due to its capacity of modeling excess zeros and overdispersion. When there are correlated count variables, a bivariate model is essential for understanding their full distributional features. Examples include measuring correlation of two genes in sparse single-cell RNA sequencing data and modeling dental caries count indices on two different tooth surface types. For these purposes, we develop a richly parametrized bivariate zero-inflated negative binomial model that has a simple latent variable framework and eight free parameters with intuitive interpretations. In the scRNA-seq data example, the correlation is estimated after adjusting for the effects of dropout events represented by excess zeros. In the dental caries data, we analyze how the treatment with Xylitol lozenges affects the marginal mean and other patterns of response manifested in the two dental caries traits. An R package "bzinb" is available on Comprehensive R Archive Network.


Assuntos
Cárie Dentária , Humanos , Modelos Estatísticos , Distribuição Binomial , Análise de Dados , Distribuição de Poisson
4.
Biometrika ; 110(2): 395-410, 2023 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-37197739

RESUMO

We propose a reinforcement learning method for estimating an optimal dynamic treatment regime for survival outcomes with dependent censoring. The estimator allows the failure time to be conditionally independent of censoring and dependent on the treatment decision times, supports a flexible number of treatment arms and treatment stages, and can maximize either the mean survival time or the survival probability at a certain time-point. The estimator is constructed using generalized random survival forests and can have polynomial rates of convergence. Simulations and analysis of the Atherosclerosis Risk in Communities study data suggest that the new estimator brings higher expected outcomes than existing methods in various settings.

5.
Nat Commun ; 14(1): 2919, 2023 05 22.
Artigo em Inglês | MEDLINE | ID: mdl-37217495

RESUMO

Streptococcus mutans has been implicated as the primary pathogen in childhood caries (tooth decay). While the role of polymicrobial communities is appreciated, it remains unclear whether other microorganisms are active contributors or interact with pathogens. Here, we integrate multi-omics of supragingival biofilm (dental plaque) from 416 preschool-age children (208 males and 208 females) in a discovery-validation pipeline to identify disease-relevant inter-species interactions. Sixteen taxa associate with childhood caries in metagenomics-metatranscriptomics analyses. Using multiscale/computational imaging and virulence assays, we examine biofilm formation dynamics, spatial arrangement, and metabolic activity of Selenomonas sputigena, Prevotella salivae and Leptotrichia wadei, either individually or with S. mutans. We show that S. sputigena, a flagellated anaerobe with previously unknown role in supragingival biofilm, becomes trapped in streptococcal exoglucans, loses motility but actively proliferates to build a honeycomb-like multicellular-superstructure encapsulating S. mutans, enhancing acidogenesis. Rodent model experiments reveal an unrecognized ability of S. sputigena to colonize supragingival tooth surfaces. While incapable of causing caries on its own, when co-infected with S. mutans, S. sputigena causes extensive tooth enamel lesions and exacerbates disease severity in vivo. In summary, we discover a pathobiont cooperating with a known pathogen to build a unique spatial structure and heighten biofilm virulence in a prevalent human disease.


Assuntos
Suscetibilidade à Cárie Dentária , Streptococcus mutans , Masculino , Criança , Feminino , Humanos , Pré-Escolar , Virulência , Streptococcus mutans/genética , Biofilmes
6.
Microorganisms ; 11(3)2023 Mar 16.
Artigo em Inglês | MEDLINE | ID: mdl-36985339

RESUMO

Integration of multi-omics data is a challenging but necessary step to advance our understanding of the biology underlying human health and disease processes. To date, investigations seeking to integrate multi-omics (e.g., microbiome and metabolome) employ simple correlation-based network analyses; however, these methods are not always well-suited for microbiome analyses because they do not accommodate the excess zeros typically present in these data. In this paper, we introduce a bivariate zero-inflated negative binomial (BZINB) model-based network and module analysis method that addresses this limitation and improves microbiome-metabolome correlation-based model fitting by accommodating excess zeros. We use real and simulated data based on a multi-omics study of childhood oral health (ZOE 2.0; investigating early childhood dental caries, ECC) and find that the accuracy of the BZINB model-based correlation method is superior compared to Spearman's rank and Pearson correlations in terms of approximating the underlying relationships between microbial taxa and metabolites. The new method, BZINB-iMMPath, facilitates the construction of metabolite-species and species-species correlation networks using BZINB and identifies modules of (i.e., correlated) species by combining BZINB and similarity-based clustering. Perturbations in correlation networks and modules can be efficiently tested between groups (i.e., healthy and diseased study participants). Upon application of the new method in the ZOE 2.0 study microbiome-metabolome data, we identify that several biologically-relevant correlations of ECC-associated microbial taxa with carbohydrate metabolites differ between healthy and dental caries-affected participants. In sum, we find that the BZINB model is a useful alternative to Spearman or Pearson correlations for estimating the underlying correlation of zero-inflated bivariate count data and thus is suitable for integrative analyses of multi-omics data such as those encountered in microbiome and metabolome studies.

7.
bioRxiv ; 2023 Feb 01.
Artigo em Inglês | MEDLINE | ID: mdl-36778424

RESUMO

Integration of multi-omics data is a challenging but necessary step to advance our understanding of the biology underlying human health and disease processes. To date, investigations seeking to integrate multi-omics (e.g., microbiome and metabolome) employ simple correlation-based network analyses; however, these methods are not always well-suited for microbiome analyses because they do not accommodate the excess zeros typically present in these data. In this paper, we introduce a bivariate zero-inflated negative binomial (BZINB) model-based network and module analysis method that addresses this limitation and improves microbiome-metabolome correlation-based model fitting by accommodating excess zeros. We use real and simulated data based on a multi-omics study of childhood oral health (ZOE 2.0; investigating early childhood dental disease, ECC) and find that the accuracy of the BZINB model-based correlation method is superior compared to Spearman’s rank and Pearson correlations in terms of approximating the underlying relationships between microbial taxa and metabolites. The new method, BZINB-iMMPath facilitates the construction of metabolite-species and species-species correlation networks using BZINB and identifies modules of (i.e., correlated) species by combining BZINB and similarity-based clustering. Perturbations in correlation networks and modules can be efficiently tested between groups (i.e., healthy and diseased study participants). Upon application of the new method in the ZOE 2.0 study microbiome-metabolome data, we identify that several biologically-relevant correlations of ECC-associated microbial taxa with carbohydrate metabolites differ between healthy and dental caries-affected participants. In sum, we find that the BZINB model is a useful alternative to Spearman or Pearson correlations for estimating the underlying correlation of zero-inflated bivariate count data and thus is suitable for integrative analyses of multi-omics data such as those encountered in microbiome and metabolome studies.

8.
BMC Med Res Methodol ; 22(1): 328, 2022 12 22.
Artigo em Inglês | MEDLINE | ID: mdl-36550398

RESUMO

BACKGROUND: Precision medicine is an emerging field that involves the selection of treatments based on patients' individual prognostic data. It is formalized through the identification of individualized treatment rules (ITRs) that maximize a clinical outcome. When the type of outcome is time-to-event, the correct handling of censoring is crucial for estimating reliable optimal ITRs. METHODS: We propose a jackknife estimator of the value function to allow for right-censored data for a binary treatment. The jackknife estimator or leave-one-out-cross-validation approach can be used to estimate the value function and select optimal ITRs using existing machine learning methods. We address the issue of censoring in survival data by introducing an inverse probability of censoring weighted (IPCW) adjustment in the expression of the jackknife estimator of the value function. In this paper, we estimate the optimal ITR by using random survival forest (RSF) and Cox proportional hazards model (COX). We use a Z-test to compare the optimal ITRs learned by RSF and COX with the zero-order model (or one-size-fits-all). Through simulation studies, we investigate the asymptotic properties and the performance of our proposed estimator under different censoring rates. We illustrate our proposed method on a phase III clinical trial of non-small cell lung cancer data. RESULTS: Our simulations show that COX outperforms RSF for small sample sizes. As sample sizes increase, the performance of RSF improves, in particular when the expected log failure time is not linear in the covariates. The estimator is fairly normally distributed across different combinations of simulation scenarios and censoring rates. When applied to a non-small-cell lung cancer data set, our method determines the zero-order model (ZOM) as the best performing model. This finding highlights the possibility that tailoring may not be needed for this cancer data set. CONCLUSION: The jackknife approach for estimating the value function in the presence of right-censored data shows satisfactory performance when there is small to moderate censoring. Winsorizing the upper and lower percentiles of the estimated survival weights for computing the IPCWs stabilizes the estimator.


Assuntos
Carcinoma Pulmonar de Células não Pequenas , Neoplasias Pulmonares , Humanos , Carcinoma Pulmonar de Células não Pequenas/terapia , Neoplasias Pulmonares/terapia , Modelos de Riscos Proporcionais , Probabilidade , Prognóstico , Simulação por Computador , Análise de Sobrevida
9.
J Comput Graph Stat ; 31(2): 390-402, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-35685204

RESUMO

We propose interval censored recursive forests (ICRF), an iterative tree ensemble method for interval censored survival data. This nonparametric regression estimator addresses the splitting bias problem of existing tree-based methods and iteratively updates survival estimates in a self-consistent manner. Consistent splitting rules are developed for interval censored data, convergence is monitored using out-of-bag samples, and kernel-smoothing is applied. The ICRF is uniformly consistent and displays high prediction accuracy in both simulations and applications to avalanche and national mortality data. An R package icrf is available on CRAN and Supplementary Materials for this article are available online.

10.
Front Cell Infect Microbiol ; 11: 734416, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-34760716

RESUMO

Microbiome data are becoming increasingly available in large health cohorts, yet metabolomics data are still scant. While many studies generate microbiome data, they lack matched metabolomics data or have considerable missing proportions of metabolites. Since metabolomics is key to understanding microbial and general biological activities, the possibility of imputing individual metabolites or inferring metabolomics pathways from microbial taxonomy or metagenomics is intriguing. Importantly, current metabolomics profiling methods such as the HMP Unified Metabolic Analysis Network (HUMAnN) have unknown accuracy and are limited in their ability to predict individual metabolites. To address this gap, we developed a novel metabolite prediction method, and we present its application and evaluation in an oral microbiome study. The new method for predicting metabolites using microbiome data (ENVIM) is based on the elastic net model (ENM). ENVIM introduces an extra step to ENM to consider variable importance (VI) scores, and thus, achieves better prediction power. We investigate the metabolite prediction performance of ENVIM using metagenomic and metatranscriptomic data in a supragingival biofilm multi-omics dataset of 289 children ages 3-5 who were participants of a community-based study of early childhood oral health (ZOE 2.0) in North Carolina, United States. We further validate ENVIM in two additional publicly available multi-omics datasets generated from studies of gut health. We select gene family sets based on variable importance scores and modify the existing ENM strategy used in the MelonnPan prediction software to accommodate the unique features of microbiome and metabolome data. We evaluate metagenomic and metatranscriptomic predictors and compare the prediction performance of ENVIM to the standard ENM employed in MelonnPan. The newly developed ENVIM method showed superior metabolite predictive accuracy than MelonnPan when trained with metatranscriptomics data only, metagenomics data only, or both. Better metabolite prediction is achieved in the gut microbiome compared with the oral microbiome setting. We report the best-predictable compounds in all these three datasets from two different body sites. For example, the metabolites trehalose, maltose, stachyose, and ribose are all well predicted by the supragingival microbiome.


Assuntos
Microbioma Gastrointestinal , Microbiota , Criança , Pré-Escolar , Microbioma Gastrointestinal/genética , Humanos , Metaboloma , Metabolômica , Metagenoma , Metagenômica
11.
J Oral Microbiol ; 13(1): 1886748, 2021 Jun 16.
Artigo em Inglês | MEDLINE | ID: mdl-34188775

RESUMO

Aim: This in vivo experimental study investigated bacterial microbiome and metabolome longitudinal changes associated with enamel caries lesion progression and arrest. Methods: We induced natural caries activity in three caries-free volunteers prior to four premolar extractions for orthodontic reasons. The experimental model included placement of a modified orthodontic band on smooth surfaces and a mesh on occlusal surfaces. We applied the caries-inducing protocol for 4- and 6-weeks, and subsequently promoted caries lesion arrest via a 2-week toothbrushing period. Lesions were verified clinically and quantitated via micro-CT enamel density measurements. The biofilm microbial composition was determined via 16S rRNA gene Illumina sequencing and NMR spectrometry was used for metabolomics. Results: Biofilm maturation and caries lesion progression were characterized by an increase in Gram-negative anaerobes, including Veillonella and Prevotella. Streptococcus was associated caries lesion progression, while a more equal distribution of Streptococcus, Bifidobacterium, Atopobium, Prevotella, Veillonella, and Saccharibacteria (TM7) characterized arrest. Lactate, acetate, pyruvate, alanine, valine, and sugars were more abundant in mature biofilms compared to newly formed biofilms. Conclusions: These longitudinal bacterial microbiome and metabolome results provide novel mechanistic insights into the role of the biofilm in caries progression and arrest and offer promising candidate biomarkers for validation in future studies.

12.
Pediatr Dent ; 43(3): 191-197, 2021 May 15.
Artigo em Inglês | MEDLINE | ID: mdl-34172112

RESUMO

Purpose: The purpose of the study was to develop and evaluate an automated machine learning algorithm (AutoML) for children's classification according to early childhood caries (ECC) status. Methods: Clinical, demographic, behavioral, and parent-reported oral health status information for a sample of 6,404 three- to five-year-old children (mean age equals 54 months) participating in an epidemiologic study of early childhood oral health in North Carolina was used. ECC prevalence (decayed, missing, and filled primary teeth surfaces [dmfs] score greater than zero, using an International Caries Detection and Assessment System score greater than or equal to three caries lesion detection threshold) was 54 percent. Ten sets of ECC predictors were evaluated for ECC classification accuracy (i.e., area under the ROC curve [AUC], sensitivity [Se], and positive predictive value [PPV]) using an AutoML deployment on Google Cloud, followed by internal validation and external replication. Results: A parsimonious model including two terms (i.e., children's age and parent-reported child oral health status: excellent/very good/good/fair/poor) had the highest AUC (0.74), Se (0.67), and PPV (0.64) scores and similar performance using an external National Health and Nutrition Examination Survey (NHANES) dataset (AUC equals 0.80, Se equals 0.73, PPV equals 0.49). Contrarily, a comprehensive model with 12 variables covering demographics (e.g., race/ethnicity, parental education), oral health behaviors, fluoride exposure, and dental home had worse performance (AUC equals 0.66, Se equals 0.54, PPV equals 0.61). Conclusions: Parsimonious automated machine learning early childhood caries classifiers, including single-item self-reports, can be valuable for ECC screening. The classifier can accommodate biological information that can help improve its performance in the future.


Assuntos
Suscetibilidade à Cárie Dentária , Cárie Dentária , Criança , Pré-Escolar , Humanos , Aprendizado de Máquina , North Carolina , Inquéritos Nutricionais , Prevalência
13.
Artigo em Inglês | MEDLINE | ID: mdl-33139633

RESUMO

Early childhood caries (ECC) is an aggressive form of dental caries occurring in the first five years of life. Despite its prevalence and consequences, little progress has been made in its prevention and even less is known about individuals' susceptibility or genomic risk factors. The genome-wide association study (GWAS) of ECC ("ZOE 2.0") is a community-based, multi-ethnic, cross-sectional, genetic epidemiologic study seeking to address this knowledge gap. This paper describes the study's design, the cohort's demographic profile, data domains, and key oral health outcomes. Between 2016 and 2019, the study enrolled 8059 3-5-year-old children attending public preschools in North Carolina, United States. Participants resided in 86 of the state's 100 counties and racial/ethnic minorities predominated-for example, 48% (n = 3872) were African American, 22% white, and 20% (n = 1611) were Hispanic/Latino. Seventy-nine percent (n = 6404) of participants underwent clinical dental examinations yielding ECC outcome measures-ECC (defined at the established caries lesion threshold) prevalence was 54% and the mean number of decayed, missing, filled surfaces due to caries was eight. Nearly all (98%) examined children provided sufficient DNA from saliva for genotyping. The cohort's community-based nature and rich data offer excellent opportunities for addressing important clinical, epidemiologic, and biological questions in early childhood.


Assuntos
Participação da Comunidade , Cárie Dentária/genética , Saúde Bucal , Pré-Escolar , Estudos Transversais , Cárie Dentária/epidemiologia , Estudos Epidemiológicos , Feminino , Estudo de Associação Genômica Ampla , Humanos , Masculino , North Carolina/epidemiologia , Prevalência
14.
Int Stat Rev ; 87(1): 152-177, 2019 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-31007356

RESUMO

Receiver operating characteristic curves are widely used as a measure of accuracy of diagnostic tests and can be summarised using the area under the receiver operating characteristic curve (AUC). Often, it is useful to construct a confidence interval for the AUC; however, because there are a number of different proposed methods to measure variance of the AUC, there are thus many different resulting methods for constructing these intervals. In this article, we compare different methods of constructing Wald-type confidence interval in the presence of missing data where the missingness mechanism is ignorable. We find that constructing confidence intervals using multiple imputation based on logistic regression gives the most robust coverage probability and the choice of confidence interval method is less important. However, when missingness rate is less severe (e.g. less than 70%), we recommend using Newcombe's Wald method for constructing confidence intervals along with multiple imputation using predictive mean matching.

15.
Methods Mol Biol ; 1922: 525-548, 2019.
Artigo em Inglês | MEDLINE | ID: mdl-30838598

RESUMO

Early childhood caries (ECC) is a biofilm-mediated disease. Social, environmental, and behavioral determinants as well as innate susceptibility are major influences on its incidence; however, from a pathogenetic standpoint, the disease is defined and driven by oral dysbiosis. In other words, the disease occurs when the natural equilibrium between the host and its oral microbiome shifts toward states that promote demineralization at the biofilm-tooth surface interface. Thus, a comprehensive understanding of dental caries as a disease requires the characterization of both the composition and the function or metabolic activity of the supragingival biofilm according to well-defined clinical statuses. However, taxonomic and functional information of the supragingival biofilm is rarely available in clinical cohorts, and its collection presents unique challenges among very young children. This paper presents a protocol and pipelines available for the conduct of supragingival biofilm microbiome studies among children in the primary dentition, that has been designed in the context of a large-scale population-based genetic epidemiologic study of ECC. The protocol is being developed for the collection of two supragingival biofilm samples from the maxillary primary dentition, enabling downstream taxonomic (e.g., metagenomics) and functional (e.g., transcriptomics and metabolomics) analyses. The protocol is being implemented in the assembly of a pediatric precision medicine cohort comprising over 6000 participants to date, contributing social, environmental, behavioral, clinical, and biological data informing ECC and other oral health outcomes.


Assuntos
Bactérias/genética , Biofilmes , Cárie Dentária/microbiologia , Metabolômica/métodos , Metagenômica/métodos , Dente Decíduo/microbiologia , Bactérias/isolamento & purificação , Bactérias/metabolismo , Pré-Escolar , DNA Bacteriano/genética , Cárie Dentária/etiologia , Perfilação da Expressão Gênica/métodos , Gengiva/microbiologia , Humanos , Microbiota , RNA Bacteriano/genética , Análise de Sequência de DNA/métodos , Análise de Sequência de RNA/métodos , Software , Manejo de Espécimes/métodos , Transcriptoma
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...